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C LEAVAGE OF CAULOBACTER PRODUC ED 
RECOMBINANT FUSION PROTEINS 



FIELD OF INVENTION 

This invention relates to the expression and secretion of recombinant fusion 
proteins from Caulobacter wherein a heterologous polypeptide is fused with all or part 
of the surface layer protein (S-layer protein) of the bacterium. 

BACKGROUND OF THE INVENTION 

Many bacteria assemble layers composed of repetitive, regularly aligned, 
proteinaceous sub-units on the outer surface of the cell. These layers are essentially 
5 two-dimensional paracrystalline arrays, and being the outer molecular layer of the 
organism, directly interface with the environment. In Caulobacter , the S-layer protein 
is synthesized by the cell in large quantities and the S-layer completely envelops the ceil 
and thus appears to be a protective layer. 

Caulobacter are natural inhabitants of most soil and freshwater environments 
0 and may persist in waste water treatment systems and effluents. The bacteria alternate 
between a stalked cell that is attached to a surface, and an adhesive motile dispersal cell 
that searches to find a new surface upon which to stick and convert to a stalked cell. 
The bacteria attach tenaciously to nearly all surfaces and do so without producing the 
extracellular enzymes or polysaccharide " slimes rt that are characteristic of most other 
5 surface attached bacteria. Caulobacters have simple requirements for growth. The 
organism is ubiquitous in the environment and has been isolated from oligotrophic to 
mesotrophic situations. They are known for their ability to tolerate low nutrient level 
stresses, for example, low phosphate levels. 

All of the freshwater Caulobacter that produce an S-layer are similar and have 
0 S-layers that are substantially the same under election microscopy. The layers are 
hexagonally arranged in all cases, with a similar centre-centre dimension (see: Walker, 
S.G., et al... (1992). "Isolation and Comparison of the Paracrystalline Surface Layer 
Proteins of Freshwater Caulobacters " J. BacterioL 174: 1783-1792). 
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16S rRNA sequence analysis of several S-layer producing Caulobacter strains 
show that they group closely (see: Stahl, D.A. tfaL (1992) "The Phylogeny of Marine 
and Freshwater Caulobacters Reflects Their Habitat" J. Bacterid. 174: 2193-2198). 
DNA probing of Southern blots using the S-layer gene from C. crescentus CB15 
5 identifies a single band that is consistent with the presence of a cognate gene (see: 
MacRae, J.D. and, J. SmiL (1991) " Characterization of Caulobacters Isolated from 
Wastewater Treatment Systems" Applied and Environmental Microbiology 57:751- 
758). Furthermore, antisera raised against the S-layer protein of CB15 reacts against 
the S-layer protein of other Caulobacter (see: Walker, S.G. et al. (1992) [supra]). All 

1 0 S-layer proteins isolated from Caulobacter may be substantially purified using the same 
methods. All strains appear to have a polysaccharide species which may be required 
for S-layer attachment (see: Walker, S.G. et al. (1992) [supra]). 

The S-layers elaborated by freshwater isolates of Caulobacter are visibly 
indistinguishable from the S-layer produced by Caulobacter strains CB2 and CB15. 

15 The S-layer proteins from the latter strains have approximately 100,000 m.w. although 
sizes of S-layer proteins from other species and strains will vary. The hydrophillic S- 
laver protein has been characterized both structurally and chemically. It is composed of 
ring-like structures spaced at 22 nm intervals arranged in a hexagonal manner on the 
outer membrane The S-layer is bound to the bacterial surface and may be removed by 

2 0 low pH treatment or by treatment with a calcium chelator such as EDTA. 

The similarity of S-layer proteins in different strains of Caulobacter permits the 
use of a cloned S-layer protein gene of one Caulobacter strain for retrieval of the 
corresponding gene in other Caulobacter strains (see: Walker, S.G. et al. (1992) 
[supra): and MacRae, J.D. et al. (1991) [supra]), 

2 5 Expression of a heterologous polypeptide as a fusion product with the S-layer 

protein of Caulobacter provides advantages not previously seen in systems for 
production of recombinant fusion proteins using other organisms such as E. coli and 
Salmonella . All known Caulobacter strains are believed to be harmless and are nearly 
ubiquitous in aquatic environments. In contrast, many Salmonella and E. coli strains 

3 0 are pathogens. Consequently, expression and secretion of a heterologous polypeptide 

using Caulobacter as a vehicle has the advantage that the expression system will be 
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stable in a variety of outdoor environments and may not present problems associated 
with the use of a pathogenic organism. Furthermore, Caulobacter are natural biofilm 
forming species and may be adapted for use in fixed biofilm bioreators. The quantity of 
S-layer protein that is synthesized and is secreted by Caulobacter is high, reaching 12% 
5 of the cell protein. 

There is an existing need to produce pure proteins and peptides in an economical 
manner and in a manner that minimizes or simplifies the purification steps needed after 
fermentation. Key commercial areas include the production of recombinant human and 
animal therapeutic antibiotic and vaccine peptides , industrial enzymes , protein 
10 polymers, and antibacterial enzymes for foodstuffs. Many of these commercial 
applications require low production costs and there are few expression systems available 
that can meet such cost restraints. In addition, there are numerous research applications 
where rapid methods to produce and purify proteins are needed to facilitate the 
discovery stage. This is especially true where there is a desire to express a large 
15 number of proteins with unknown function (from a collections of cloned cDNA's, for 
example) or a large number of variants of a single protein, (for example, resulting from 
site directed mutagenesis) in a search for variants with improved properties. 

Generally, proteins must be secreted to be produced at low cost. The primary 
reason is the much reduced cost of purification of the target protein from cell material. 
2 0 However, even for secreted proteins, simple methods of separating the product from 
spent culture and cells are important for cost reduction and ease of use. 

An international patent application published as WO 97/34000 on September 18, 
1997 describes the expression and secretion of recombinant proteins from Caulobacter 
in which the recombinant protein is a fusion of all or pan of Caulobacter S-layer protein 

2 5 with a heterologous protein of interest (also see: Bingle, W.H., et al. 1997 ! "Linker 

Mutagenesis of the Caulobacter us S-layer protein: Toward a Definition of an N- 
terminal Anchoring Region and a C-terminal Secretion Signal and the Potential for 
Heterologous Protein Secretion". J. Bacteriol. 179:601-611). 

The Caulobacter S-layer secretion apparatus is in the category of "Type 1" 

3 0 secretion usually found in pathogenic bacteria and noted for its ability to secrete a wide 

variety of proteins including large and hydrophillic proteins. The Caulobacter protein 
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secretion system is particularly useful to secrete recombinant proteins. 

The Caulobacter S-layer Type 1 secretion pathway requires only a C-terminal 

secretion signal typically comprising about 200 amino acids at the end of the protein. 
The export mechanism is capable of tolerating a wide variety of foreign proteins. 
5 Recombinant proteins may be conveniently produced as fusion proteins with the target 
protein being fused to the C-terminal secretion signal. Depending on the application, it 
may be desirable to remove the secretion signal following secretion. Not removing the 
secretion signal may be an approach suitable for many subunit vaccine applications, 
where the remaining S-layer protein serves as a carrier. 

10 A unique and desirable feature of fusion proteins produced by the Caulobacter 

S-layer protein secretion system is that they form insoluble aggregates in the culture 
medium. This is apparently a consequence of the S-layer sequences associated with 
secretion signal and reflects the fact that the protein normally self-assembles into a two 
dimensional crystalline layer on the bacterium's surface. These aggregates are visible 

15 to the naked eye and are readily collected by simple filtration. With simple water wash 
steps, residual bacterial cells are readily flushed away. It is routinely possible to 
achieve a protein purity of 90% or better with this simple purification procedure. 



20 



30 



DESCRIPTION OF THE PRIOR ART 



Most current protein purification systems for recombinant proteins produced b\ 
bacteria rely upon an affinity matrix to achieve separation of the target protein and to 
concentrate the protein for subsequent steps of purification. To accomplish this, genes 
for recombinant proteins are commonly constructed so that they contain affinity' tags. 
2 5 which are protein sequences that will bind to an affinity matrix. Commonly used 
systems include the following: 

(a) glutathione S-transferase (GST) tag. which binds to glutathione-sepharose 
matrices: 



(b) maltose binding protein (MBP) tag, which binds to amylose matrices; 
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(c) multiple tandem histidine residues (e.g. n His-6 w ) tag. wmch binds to 
nickel-derivatized solid matrices: and 

5 (d) protein A tag, which binds to Immunoglobulin IgG-derivsnsed sepharose or 
comparable matrices. 

Prior art techniques were typically developed so that removal :f a target protein 
does not disrupt the tag and matrix association. Instead, enzymes iar cleave specific 
10 sequences of amino acids are employed. The enzyme cleavage sequence is positioned 
between the tag and the desired recombinant protein and enzymatic oeavage is effected 
directly on the matrix with attached fusion protein. If a secretion signal is used, the 
cleavage site is usually positioned such that the secretion signal is separated from the 
target recombinant protein during the cleavage step. The matrix is regenerated for re- 
15 use only after the target recombinant protein has been purified awa} from the matrix. 
Typical enzymes used in these methods are Factor Xa. enterokinase sic coilagenase. 

Chemical cleavage is generally not used because the •condnoes required for 
cleavaee will disrupt the binding of affinity tag and matrix or desire} 3e matrix. When 
chemical cleavage is used with recombinant fusion proteins to cleave arset protein from 
2 0 a secretion signal and/or affinity tag, solubilization and denanirazoo processes are 
2enerally employed. The expectation is that complete or nearly complete unfolding of 
the protein is a prerequisite for effective cleavage. 

Mild-acid cleavage is predicated on the inclusion, by happersance or design, of 
the acid-sensitive aspartate-proline dipeptide at a desired site for debase. The protein 

2 5 to be cleaved is typically exposed to conditions that solubilize and or completely 

denature the protein prior to cleavage. The chaotropic agent guannine hydrochloride 
(used at 6-7 M) is commonly employed to denature and solubilize protein prior to. 
or at the same time as acid treatment. Alternately, high concenuancns of acids that also 
serve as solubilizing agents (as examples: 70-90% formic acid- acetic acid [10%] 

3 0 pyridine, or relatively high concentrations of HCL (60 mM or xnsre) are employed. 

Because such conditions would disrupt a tag/affinity matrix associsdOB, direct cleavage 
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of an affinity tag from the target protein while a protein remains associated with an 
affinity matrix is not attempted. 

General conditions for cleavage at aspartate - proline sites are described in 
5 Current Protocols in Molecular Biology (supp. 28; chapter 16.4) John Wiley & Sons 
Inc. 1994, and in Landon, M. "Cleavage at Aspartyl - Prolyl Bonds" in Methods in 
Enzymology (1977) 47: 145-149. These references suggest that significant variability 
of cleavage conditions exist for different proteins and that cleavage might occur in some 
instances without first denaturing or solubilizing the protein. However, in practice, the 
10 latter circumstances are rare and proteins to be subjected to acid cleavage at Asp-Pro 
dipeptides are usually solubilized to a state where there is no visible turbidity. Such 
solubilized protein will normally not pellet when centrifuged at 100,000 x g for 1 hour. 
It is now shown that mild-acid conditions may be used for cleavage of aspartate-proline 
sites in Caulobacter S-layer fusion proteins without placing the protein in a solubilized 

1 E state as described above. 

SUMMARY OF INVENTION 

This invention is based on the unexpected discovery that recombinant fusion 

2 0 proteins produced by the Caulobacter S-layer protein secretion system can be cleaved 

under mild-acid conditions and solubilization of the fusion protein is not required. 
Cleavage may be accomplished while the fusion protein is in the form of an insoluble 
aggregate typical of the Caulobacter S-layer protein. Cleavage occurs at aspartate- 
protein dipeptides which may be in a heterologous protein portion of the fusion protein 

2 5 or in a portion that is native to the Caulobacter S-layer portion. The dipeptide may be 
placed at a desired location for cleavage by engineering DNA encoding the fusion 
protein to express the dipeptide at the desired location. A preferable location for 
cleavage may be at or near the junction between a heterologous (target) protein and the 
Caulobacter S-layer portion comprising the Caulobacter secretion signal, such that a 

30 cleavage product will be the target protein in its entirety and substantially free of 
extraneous amino acids. 
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The current invention makes it possible to cleave a heterologous (target) protein 
from the S-layer protein portion using only mild-acid conditions, even while the fusion 
protein is in an aggregated form. These cleavage conditions do not result in significant 
solubilization of the S-layer protein portion. 

5 

This invention provides a method of cleaving a fusion protein including a first 
component which comprises all or part of a Caulobacter S-layer protein including a 
Caulobacter C-terminal secretion signal, and a second component heterologous to 
Caulobacter The fusion protein contains at least one aspartate-proline dipeptide. The 
G method comprises combining the fusion protein with an acid solution of a strength 
insufficient to solubilize the fusion protein for a time sufficient for cleavage of the 
fusion protein at the aspartate-proline dipeptide. The acid solution may have a pH of 
from about 1.5 leg. 1.5 ± 0.1) to about 2.5 (eg. 2.5 ± 0.1), and preferably from about 
1.65 (eg. 1.65 ± 0.05) to about 2.35 (eg. 2.35 ± 0.05). Preferred pH conditions may 
.5 be achieved using an acid equivalent in the range of about 5 to about 20 mM HCL. 
The method is typically carried out at a temperature in the range of approximately room 
temperature to about 50°C 

This invention also provides a method of preparing a DNA construct suitable for 
expression of a fusion protein suitable for use in the method of this invention. The 
2 C method comprises joining an upstream DNA segment including DNA heterologous to 
Caulobacter which includes a protein of interest to a downstream DNA segment 
including DNA for a Caulobacter C-terminal secretion signal which does not encode an 
aspartate-proline dipeptide. The upstream segment contains DNA encoding an 
aspartate-proline dipeptide at or near the junction between said upstream and 

2 5 downstream segments . 

This invention also provides a method of preparing a fusion protein, comprising 
the steps of expressing a DNA construct as described above in Caulobacter and 
recovering said fusion protein once secreted by the Caulobacter. 

Once cleavage is accomplished according to this invention, the S-layer portion 

3 0 comprising the Caulobacter secretion signal may remain as an insoluble aggregate. If 

the target protein is soluble, the S-layer portion may be easily separated from the target 
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recombinant protein by simple centrifugation or filtration methods. Thus the system of 
this invention facilitates separation as would a Tag/affinity matrix system except that 
here, the system is also the means for producing an insoluble matrix. In addition, the 
insoluble matrix produced by this invention is resistant to the effects of the acid 
5 treatment, allowing direct cleavage of the target recombinant protein. In this way, a 
very inexpensive chemical cleavage method can be employed to economically retrieve 
recombinant proteins from a bacterial fusion protein. In contrast to the cost of most 
affinity matrices, there is little expense associated with the use of the S-layer secretion 
signal as it is simply a pan of the fermentation'secretion process. 

10 

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION 

Production of Recombinant Fusion Proteins Using 
the Caulobacter S-layer Secretion System 

15 

Proteins may be produced using the Caulobacter S-layer Type 1 secretion 
pathway which requires only the C-terminal secretion signal of the Caulobacter . This 
signal is the C-terminal portion of the S-layer protein, which typically comprises about 
200 amino acids. (See: Bingle, et ai. (1997) [supra]; and, WO 97/34000). Additional 

2 0 Caulobacter S-layer DN A upstream from the secretion signal may also be present and 
may be desirable to encode portions of the S-layer protein which will contribute to 
aggregate formation of the secreted protein. Such additional Caulobacter DNA may 
constitute most or all of the remainder of the DNA encoding the S-layer protein. 

Standard techniques (such as methods described in WO 97/34000) may be used 

2 5 to identify the amount of the C-terminal portion of a particular Caulobacter S-layer 
protein which functions as the secretion signal. 

Creation of fusion proteins is commonly done by preparing DNA which codes 
for the target protein and fusing it in-frame with the C-terminal region of the S-layer 
gene. There are numerous possible methods, with the following being examples. 

30 1. Oligonucleotide Chemical Synthesis. This involves the design of 
complementary single strands, complete with desirable restriction endonuclease cut sites 
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at the ends, chemical synthesis of the strands followed by annealing, cloning into a 
plasmid vector, juxtaposed to an appropriate portion of the C-terminal region of the S- 

layer gene. 

2. Production of the Target Gene DNA by Polymerase Chain Reaction (PCR) 
5 Amplification of a Target Sequence, In this case, appropriate in-frame restriction 

sites are incorporated into the short oligonucleotides used for amplification of a target 
sequence, such that the final PCR product can be treated with the appropriate restriction 
enzymes (to create the restriction site "sticky ends"), followed by cloning into a plasmid 
vector, juxtaposed to an appropriate portion of the C-terminal region of the S-layer 
10 gene. 

3. Adapting Restriction Endonuclease Cleavage Sites that are Native to a 
Target Protein Gene Sequence for Fusion to the DNA Coding for the C-terminal S- 
layer Secretion Signal to Accomplish In-frame Expression of a Chimeric Protein. 

15 This can be accomplished by direct ligation (although it is uncommon that an 
appropriate match will occur), or the use of adapter sequences or methods involving 
blunting of a restriction site and subsequent blunt-end ligation to change expression 
reading frame or join unlike restriction site sticky ends. 

There will be numerous convenient sites for fusion with the C-terminal regions 

2 0 of the S-layer that lead to the successful expression, secretion and aggregation of a 
recombinant fusion protein. Some example positions are at or near the DNA sites 
corresponding to amino acids 622, 690, 784, 892 and 907 of the C. crescentus S-layer 
gene (see: Appendix 1 and, WO 97/34000). Other sites of fusion with the S-layer gene 
may also be employed. Most often a plasmid vector is designed such that the C- 

2 5 terminal gene segment is resident on a plasmid with appropriate restriction sites placed 

at the N-terminal junction of the S-layer fragment. Target recombinant protein gene 
segments are then cloned into those restriction sites. It is typical to prepare initial 
plasmid constructs that are replicated in E.coli . After a construct is produced, it is 
typically transferred to a broad host range plasmid which can then be introduced into 

3 0 the appropriate Caulobacter strain by electroporation. Suitable broad host range 

plasmids can be constructed from (but are not limited to) the IncQ, IncW and IncPl 
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plasmid incompatibility groups. 

The introduction of the aspartate-proline (Asp-Pro) dipeptide at the appropriate 

site in the fusion protein can be done in several ways. Some examples are: 

5 (a) incorporating a DNA sequence necessary to express the Asp-Pro 

dipeptide into the oligonucleotides used to prepare the target sequence, either by 
oligonucleotide synthesis or PCR methods; 

(b) preparing a DNA segment with appropriate restriction sites at the termini 
1C so that an Asp-Pro dipeptide can be introduced taost often at the junction between S- 

layer and target gene) after a fusion recombinant S-layer gene has been made; and 

(c) use of a native Asp-Pro dipeptide in either the target DNA or the S-layer 
segment (for example, an Asp-Pro dipeptide is located a! amino acids 692 and 693 of 
the C. crescentus S-layer gene and is suitable for fusions made at the amino acid site). 

15 The methods described above are not the only methods that may be used for 

creating and expressing fusion recombinant S-layer proteins, nor is it necessary to have 
the engineered genes resident on a plasmid. For example, the expressed gene may be 
introduced into the chromosome (using well-known gene insertion or replacement 
techniques; and still achieve secretion of the recombinant proteins (see WO 97/34000). 

2 c In some cases it may be desirable to produce recombinant fusion proteins as insertions 
of heterologous DNA in the middle of the S-layer gene. In such a case, Asp-Pro 
dipeptide sequences could be engineered at the N and C-termini of the target peptide. 

All possible codon combinations for Asp-Pro will work but the CCA codon for 
proline is not preferred due to the likelihood of a low amount of the corresponding 

2 5 tRNA being present in Caulobacter . The following is an approximate usage table for 
C. crescentus. 
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Caulobacter crescentus Codon Usage Tabic 
[Amino Acid] [Triplet Code] [Frecsjenc. =~*r Thousand] 

PheUUU 2.5 Ser JC_ 12 Try UAU 6.S Cvs UGU 0 6 

Phe UUC 27 0 Se-^C 2^ TrJ UAC 9.6 Cyl UGC H 

LeuUUA 0.0 Ser CA 12 STOPUAA 0 8 Cvs UGA 16 

LeuUUG 4.4 Ser-JCC- 25.7 STOPUAG 0.6 SWUGG 7 J 

Leu CUU 4 4 Pre CCX. 2-5 His CAU 3.2 Arg CGU 7.6 

Leu CUC 15 7 Pre 3CC 15.5 His CAC 12.2 Arg CGC 44 7 

Leu CUA 11 P^ X- ZJ> Gin CAA 3.7 Arg CGA 376 

Leu CUG 72.3 Prz CCO ~ 1 Gin CAG 30.2 Arg CGG 12 1 

lleAUU 2.4 Tnr * 2 Asn AAU 4 1 Ser AGU n ft 

IjeAUC 49.0 T£ACC 3T7.3 Asn AAC 23.8 Ser AGC 14 9 



lleAUA 0.3 Thr *CA Q_£ Lys AAA 2.7 Ara AGA 0 4 

Met AUG 25.7 Thr*C£ \~>£. LysAAG 37.9 Arg AGG 1.1 

WzVM 5>t ^Scx s - AspGAU 11.1 GiyGGU 9 5 

15 Va GUC 42 7 3CC AspGAC 48.5 Gly GGC 64 8 

£H£ ^fCA 2-2 Glu GAA 20.5 Gly GGA 2 3 

Val GUG 30.7 As 3CC- 36 ? Glu GAG 45.4 Gly GGG 7 7 
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Large quantities (eg. 12% of total cell protein/3% of input organic carbon) of a 
wide range of proteins can be produced, with yields in the order of 250 mg /liter of 
batch culture. Fusion proteins with 35 kDa of target peptide are secreted with little 
5 difficulty, although proteins with multiple cysteines may be more difficult to express. 
Post-expression glycosylation of proteins does not occur, an advantage for most peptide 
expression applications. 

: Host Expression Strains 

For secretion of recombinant fusion S-layer proteins, the Caulobacter strain will 
preferably be one which has lost the ability to produce a native S-layer protein, while 
retaining a fully functional S-layer protein secretion apparatus. Such strains may be 
obtained by screening for mutants that have spontaneously become S-layer protein 
negative; or. by directed genetic manipulation, such as (but not limited to) the insertion 
of a drug resistance cassette in the middle of the S-layer gene or the substitution of a 
version of the S-layer gene which has had a sizeable internal region deleted from the 
gene (see: Bingle et al. 1997 1 [supra]; Bingle ec al. 1997 2 "Ceil Surface Display of a 
2: Pseudonomonas aerugenosa PAK Pilin Peptide with the Paracrystalline Layer of 
Caulobacter crescentus " Molec. Microbiol. 26:277-288; and, Edwards and Smit (1991) 
A Transducing Bacteriophage for Caulobacter us Uses the Paracrystalline Surface 
Layer Protein as a Receptor" J. BacterioL 173: 5568-5572), In the case of a genetic 
manipulation, a common method for producing such strains is to modify a copy of the 

2 5 S-layer gene while on a plasmid and then to use well known gene replacement methods 

to substitute the modified gene for the native gene in the Caulobacter chromosome (see: 
Edwards and Smit (1991) [supra]). 

If an entire S-layer gene is to be used for production of a recombinant protein 
(via insertion of a target sequence), strains defective in the production of the 

3 3 Iipopolysacharide (LPS) used for S-layer attachment to the bacterial surface can be 

used. These can be prepared by forcing Caulobacter to grow without exogenous 
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calcium. Under these conditions mutants arise that are uniformly defective in 
producing a proficient version of the S-layer LPS (see: Walker, S.G. et al. (1994) 
"Characteristics of Mutants of Caulobacter crescentus Defective in Surface Attachment 
of the Paracrystaline Layer" J. Bacteriol. 176: 6312-6323). 
5 All Caulobacter S-layer producing strains are suitable for this technology. One 

may isolate the S-layer gene from a particular strain (using homology between 
Caulobacter S-layers to design probes to detect and clone the S-layer genes) and adapt 
the C -terminal region for recombinant protein expression, in a manner similar to that 
done for C. crescentus strains (see: MacRae and Smit (1991) [supra], and Walker. S.G. 
10 et al. (1992) [supra]). Alternatively, one may construct recombinant fusion S-layer 
genes using the C. crescentus S-layer gene and express the recombinant genes in 
alternate Caulobacter hosts. 

Freshwater Caulobacter producing S-layers may be readily detected by negative 
stain transmission electron microscopy techniques. Caulobacter may be isolated using 
15 the methods outlined by MacRae and Smit (1991) [supra], which take advantage of the 
fact that Caulobacter can tolerate periods of starvation while other soil and water 
bacteria may not and that they all produce a distinctive stalk structure, visible by light 
microscopy (using either phase contrast or standard dye staining methods). Once 
Caulobacter strains are isolated in a typical procedure, colonies may be suspended in 
2 0 2% ammonium molybdate negative stain and applied to plastic-filmed, carbon-stabilized 
300 or 400 mesh copper or nickel grids and examined in a transmission electron 
microscope at 60 kilovolt accelerating voltage (see: Smit, J. (1986) "Protein Surface 
Layers of Bacteria", in Outer Membranes as Model Systems , (M. Inouge, ed. J.Wiley 
& Sons, at p. 343-376 ». S-layers are seen as two-dimensional geometric patterns most 

2 5 readily on those cells in a colony that have lysed and released their internal contents. 

Recombinant Protein Purification 

Secreted proteins are separated ami shed into the culture media as a macroscopic 

3 0 precipitate (the " aggregate w referred to herein). The shedding phenomenon is a 

consequence of the absence of the N-terminal region of the S-layer protein in the 
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expressed recombinant protein, or the loss of the lipopolysaccharide species used for S- 
layer attachment by the Caulobacter (see: Walker, S.G. et al. (1994) [supra]). 
Typically, the aggregate forms as loose, gel-like lumps of pure protein that can readily 
be retrieved and separated from the bacteria by simple filtration. 
5 The aggregate may be readily separated from a soluble cleaved target protein by 

any suitable techniques such as filtration of centrifugation. If the target protein is 
insoluble once cleaved, it may then be convenient to then solubilize one or both of the 
proteins (for example in 8M urea or 6M quanidine HCL) and separate by 
chromatography- In this way, only 2 species of protein need to be separated. 

0 

Cleavage of Fusion Proteins 

General procedures for performing mild-acid cleavage are known from in the 
prior art as described above. In the method of this invention, conditions are adjusted to 
15 avoid destruction of the target protein or solubilization of the aggregate containing the 
S-iayer secretion signal. Excess acid or too high a temperature may increase the 
occurrence over time of random cleavages along the length of the fusion protein, which 
is to be avoided since such random cleavages may lead to undersized fragmentation of 
the fusion protein or solubilization of the aggregated S-layer portion. 

20 

Good yields of target protein with minimum random breaks in the fusion protein 
may generally be achieved by using from 5-20 mM HCL (or its equivalent while 
employing another acid). The respective pH of these conditions (unbuffered acid 
solution) is from about 2.3 to about 1.7. Time and temperature is preferably adjusted 

2 5 by routine monitoring to achieve the desired cleavage while minimizing random breaks. 

For example, temperature may range from room temperature to about 50° C. Time of 
treatment may range from about 12 to about 72 hours. Time or temperature outside of 
these ranges is permissible depending upon the strength of the acid and the accepted 
yield. Generally, lower yields are obtained with less acid strength, less time or lower 

3 0 temperatures. 

In the following examples, efficiency of cleavage in the order of 40-80% is 
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achieved using conditions the same as or similar to the following alternatives: 

- 5 mM HCL at 5^ C for 48-72 hours 

- 20 mM HCL at 30" C. for 48-72 hours. 

Conditions in excess of the aforementioned values may be employed in some 
5 cases with the possibility of random breaks increasing, particularly with increased acid 
strength or temperature. In the following examples, significant random cleavage 
occurred with 50 mM HCL at 50° C. after 48 hours. 

Any acid may be employed in this invention which is normally used in solutions 
to which proteins are exposed. Acids which have a deleterious effect on proteins under 
10 dilute conditions should be avoided. For example, HCL or an equivalent amount of 
H^SOj may be used in this invention but oxidizing acids such as nitric acid may not be 
suitable. 

Example 1 . Cleavage of artificial silk protein sequences 
1 5 from a secretion signal containing a native aspartate-proline cleavage site. 

An artificial protein sequence resembling spider silk was constructed by 
synthesis of partially overlapping and complementing oligomers of DNA, which were 
then completed io a full duplex DNA with Taql polymerase extension, to create a 
2 0 sequence that coded for 97 amino acids. The resulting DNA sequence and 
corresponding amino acid sequence are shown in Appendix 2. 

The DNA sequence shown in Appendix 2 was cloned into a gene carrier 
sequence residing in a pUC8 plasmid cloning vector. The gene segment carrier had 
BamHl restriction sites at each end and an internal Bglll site. This combination of 

2 5 restrictions sites allowed the production of multimers of the above sequence, relying on 

the fact that BamHl sticky ends will ligate into Bgin sticky end, with the loss of both 
restriction sites. Thus one copy of the silk-like sequence within the gene segment 
carrier can be put inside a second copy of the same to produce a dimer. Using this 
principle, an 8X repeat was produced, fused to DNA encoding the S-layer secretion 

3 0 signal corresponding to the C-terminal portion of the C. crescentus S-layer protein from 

about amino acid 690 onwards (see: Appendix 1). This fusion protein gene was 
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introduced into strain CB2A on a broad host range plasmid vector. The 8x multimer 
appeared to be unstable, resulting in recombination events that reduced the 8X multimer 
to a 3x size. The 3 fold repeat of the above 97 amino acid sequence, fused to the S- 
layer secretion signal was secreted. Protein was collected and subjected to treatment 
5 with 5mM HCL for 2 days at 50° C The result was the liberation of about 80% of 
soluble silk-like polymer which was readily separated by filtration from the S-layer 
protein which remained completely aggregated under these conditions. Cleavage 
occurred at native aspartate-proline dimer in the Caulobacter S-layer signal region (see: 
Appendix L amino acids numbered 692-693). 

.0 

Example 2. Cleavage of the saimonid virus Infectious Pancreatic Necrosis 
Virus (IPNV) surface glycoprotein candidate vaccine sequence from an 
S-layer secretion signal containing a native aspartate-proline site. 

.5 

The surface glycoprotein of the EPNV strain is a vaccine candidate. For this 
example and Example 4, the sequence of the first 257 amino acids of the mature protein 
and the corresponding DNA sequence as shown in Appendix 3 were used. 

DNA encoding a segment of the major surface glycoprotein gene of EPNV 

2 C specifying amino acids 145-257 of the protein was fused to DNA sequence specifying 

two putative T-cell activating epitopes: MVF (SEQ ED No:l; LSEIKG VTVHRLEG V , 
derived from Measles Virus protein F) and P2 (SEQ ID No:2; QYIKANSKFIGITEL, 
derived from tetanus toxoid protein). The T-cell epitopes were positioned on the C- 
terminal end of the IPNV sequence. This chimeric protein was in turn fused in frame 
25 with the C-crescentus S-layer gene at about amino acid 690 position of the gene and 
introduced into Caulobacter on a broad host range plasmid vector. The resulting 
secreted protein was collected and treated with 5 mM HCL for 2 days at 50° C. 
Cleavage occurred at the native aspartate-proline dimer described in Example 1 . The 
result was the liberation of about 75% of soluble vaccine candidate chimeric protein 

3 0 from the S-layer secretion signal which remained aggregated. 
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Example 3. Cleavage of segments of an E. coli type I pilus tip subunit from 
an S-layer secretion signal containing a native aspartate-proline cleavage site. 

5 The FimH gene product is the tip pilus subunit of the E. coli strains involved 

with urinary tract infections. Two segments, T3 (specifying the first 145 amino acids 
of the mature peptide) and T7 (specifying the entire 258 amino acids of the mature 
peptide) were fused to the S-layer secretion signal at about amino acid 690 of the 
S-layer sequence. The T3 and T7 sequences are shown in Appendix 4. 

ic The fusion protein genes were introduced into strain CB2A on a broad host 

range plasmid vector. In both cases the resulting secreted protein was collected and 
treated with 5 mM HCL for 2 days at 50° C. In both cases, the result was the liberation 
of about 50% of soluble vaccine candidate chimeric protein from the S-layer secretion 
signal which remained aggregated. Cleavage occurred at the native aspartate-proline 

1 5 dimer described in Example 1 . 



Example 4 . Cleavage of the salmonid virus BPNV surface glycoprotein 
candidate vaccine sequence from an S-layer secretion signal containing 
an introduced aspartate-proline cleavage site, 

2C 

A segment of the major surface glycoprotein gene of IPNV specifying amino 
acids 1-257 of the protein shown in Appendix 4 was fused to a DNA sequence 
specifying a peptide containing an aspartate-proline dipeptide (SEQ ID No: 3; 
SPLGPAGDPEAS) such that the aspartate-proline dipeptide was positioned very near 

2 5 the C -terminus of the chimeric protein. This chimeric protein was in turn fused in 

frame with the C. crescentus S-layer gene at about amino acid 784 position of the gene 
and introduced in strain CB2A on a broad host range plasmid vector. The resulting 
secreted protein was collected and treated with 5 mM HCL for 2 days at 50° C. 
Cleavage occurred at the introduced aspartate-proline dipeptide. The result was the 

3 o liberation of about 40% of insoluble vaccine candidate chimeric protein from the S- 

layer secretion signal which remained aggregated. 

Longer DNA and amino acid sequences referred to above are set out in the 
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following Appendices which are part of this description. Appendix 1 sets out the 
complete nucleotide sequence of the C. crescentus S-layer gene (SEQ ID No: 4) with 
the upstream sequence including the -35 and -10 sites of the promoter region and the 
Shine Dalgarno sequence. The start codon is at nucleotide 101 and the coding sequence 
5 run to and includes nucleotide 3179. The amino acid sequence of the C. crescentus S- 
layer protein (SEQ ID No: 5) included in Appendix 1 is predicted from the DNA 
sequence. Appendix 2 sets out the artificial spider silk DNA sequence (SEQ ED No:6) 
used in Example 1 and the corresponding amino acid sequence (SEQ ID No. 7). 
Appendix 3 sets out the DNA sequence (SEQ ID No: 8) and corresponding amino acid 
2 sequence (SEQ ED No: 9) of the first 257 amino acids of EPNV as described in 
Examples 2 and 4. Appendix 4 sets out the T3 protein sequence (SEQ ID No: 10) and 
the T7 protein sequence (SEQ ID No: 1 1) as described in Example 3. 

All publications, patents and patent applications referred to herein are hereby 
incorporated b> reference. While this invention has been described according to 
.5 particular embodiments and by reference to certain examples, it will be apparent to 
those of skill in the art that variations and modifications of the invention as described 
herein fall within the spirit and scope of the attached claims. 
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SEQUENCE LISTING 

<110> Smit, John 

<120> CLEAVAGE OF CAULOBACTER PRODUCED 
RECOMBINANT FUSION PROTEINS 

<130> 08106-004001 

<140> 09/743,731 
<141> 2001-01-12 

<150> PCT/CA99/00637 
<151> 1999-07-14 

<150> CA 2,237,704 
<151> 1998-07-14 

<160> 11 

<170> FastSEQ for Windows Version 4.0 

y3 <2io> i 

M <211> 15 
...I! <212> PRT 

M§ <213> Artificial Sequence 

;.. s 

u l <220> 

<223> Fusion protein 

'km 

* <400> 1 

£5 Leu Ser Glu lie Lys Gly Val lie Val His Arg Leu Glu Gly Val 
£ 15 10 15 

m i 

fjl <210> 2 
fi <211> 15 
ff <212> PRT 

<213> Artificxal Sequence 

<220> 

<223> Fusion protein 
<400> 2 

Gin Tyr lie Lys Ala Asn Ser Lys Phe lie Gly lie Thr Glu Leu 
15 10 15 

<210> 3 
<211> 12 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Synthetically generated peptide 
<400> 3 

Ser Pro Leu Gly Pro Ala Gly Asp Pro Glu Ala Ser 
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15 10 

<210> 4 
<211> 3300 
<212> DNA 

<213> Caulofoacter crescentus 

<220> 
<221> CDS 

<222> (101) . . . (3179) 
<400> 4 

gctattgtcg acgtatgacg tttgctctat agccatcgct gctcccatgc gcgccactcg 60 
gtcgcagggg gtgtgggatt ttttttggga gacaatcctc atg gcc tat acg acg 115 

Met Ala Tyr Thr Thr 
1 5 

ict gcg tac acc aac gcc aac etc ggc aag gcg cct 163 
fhr Ala Tyr Thr Asn Ala Asn Leu Gly Lys Ala Pro 
10 15 20 

■teg ctg acg etc gac gcg tac gcg act caa acc cag 211 
Chr Leu Thr Leu Asp Ala Tyr Ala Thr Gin Thr Gin 
30 35 

-eg gac gcc get gcg ctg acc aac acc ctg aag ctg 259 
Ser Asp Ala Ala Ala Leu Thr Asn Thr Leu Lys Leu 
45 50 

acg get gtt gcc ate cag acc tac cag ttc ttc acc 307 
Thr Ala Val Ala He Gin Thr Tyr Gin Phe Phe Thr 
60 65 

~cg gcc get ggt ctg gac ttc ctg gtc gac teg acc 355 
3er Ala Ala Gly Leu Asp Phe Leu Val Asp Ser Thr 
75 80 85 

jac ctg aac gac gcg tac tac teg aag ttc get cag 403 
^sp Leu Asn Asp Ala Tyr Tyr Ser Lys Phe Ala Gin 
90 95 100 

ate aac ttc teg ate aac ctg gcc acg ggc gcc ggc 451 
He Asn Phe Ser He Asn Leu Ala Thr Gly Ala Gly 
110 115 

jet ttc gcc gcc gcc tac acg ggc gtt teg tac gcc 499 
hla. Phe Ala Ala Ala Tyr Thr Gly Val Ser Tyr Ala 
120 125 130 

cag acg gtc gcc acc gcc tat gac aag ate ate ggc aac gcc gtc gcg 547 
Gin Thr Val Ala Thr Ala Tyr Asp Lys He He Gly Asn Ala Val Ala 
135 140 145 

acc gcc get ggc gtc gac gtc gcg gcc gcc gtg get ttc ctg age cgc 595 
Thr Ala Ala Gly Val Asp Val Ala Ala Ala Val Ala Phe Leu Ser Arg 
150 155 160 165 





gcc 


cag 


ttg 


gtg 




Ala 


Gin 


Leu 


Val 




gac 


gcc 


gcc 


acc 


.Ft 


Asp 


Ala 


Ala 


Thr 


%}*£■■■ 








25 


si 


acg 


ggc 


ggc 


etc 


III 


Thr 


Gly 


Gly 


Leu 


M- 






40 




W 


gtc 


aac 


age 


acg 




Val 


Asn 


Ser 


Thr 






55 
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ggc 


gtt 


gcc 


ccg 


I*-s £ 


Gly 


Val 


Ala 


Pro 


ui 


70 




















acc 


aac 


acc 


aac 




Thr 


Asn 


Thr 


Asn 




gaa 


aac 


cgc 


ttc 




Glu 


Asn 


Arg 


Phe 










105 




gcc 


ggc 


gcg 


acg 




Ala 


Gly 


Ala 


Thr 



cag 
Gin 


gec 
Ala 


aac 
Asn 


ate 
lie 


gac 
Asp 
170 


tac 
Tvr 


ctg 
Leu 


acc 
Thr 


gec 

Ala 


ttc 
Phe 

175 


gtg cgc gee aac 
Val Arg Ala Asn 


acg 
Thr 
180 


ccg 
Pro 


643 


ttc 
Phe 


acg 
Thr 


gec 
Ala 


get 

Ala 
185 


gec 
Ala 


gac 
Asp 


ate 
He 


gat 
Asp 


ctg 
Leu 
190 


gee 
Ala 


gtc aag gec gee 
Val Lys Ala Ala 
195 


ctg 
Leu 


ate 
He 


691 


Gly 


acc 
Thr 


ate 
lie 
200 


ctg 
Leu 


aac 
Asn 


gec 
Ala 


gec 
Ala 


acg 
Thr 
205 


gtg 
Val 


teg ggc ate ggt ggt 
Ser Gly He Gly Gly 
210 


tac 
Tyr 


gcg 
Ala 


739 


acc 
Thr 


gec 
Ala 
215 


acg 
Thr 


gec 
Ala 


gcg 
Ala 


atg 
Met 


ate 
He 
220 


aac 
Asn 


gac 
Asp 


ctg 
Leu 


teg gac ggc gec 
Ser Asp Gly Ala 
225 


ctg 
Leu 


teg 
Ser 


787 


acc 
Thr 
230 


gac 
Asp 


aac 
Asn 


gcg 
Ala 


get 

Ala 


ggc 

Glv 
235 


gtg 

Val 


aac 
Asn 


ctg 
Leu 


ttc 
Phe 


acc gec tat ccg 
Thr Ala Tyr Pro 
240 


teg 
Ser 


teg 
Ser 
245 


835 


ggc 

. 


gtg 


teg 


ggt 

Gly 


teg 
Ser 
250 


acc 
Thr 


etc 
Leu 


teg 
Ser 


ctg 
Leu 


acc acc ggc acc gac 
Thr Thr Gly Thr Asp 
255 


acc 
Thr 
260 


ctg 
Leu 


883 


M acg 
JS Thr 

ijl 


ggc 

Gly 


acc 
Thr 


gec 

Ala 
265 


aac 
Asn 


aac 

As xi 


gac 

Asp 


acg 
Thr 


ttc 
Phe 
270 


gtt 

Val 


gcg ggt gaa gtc 
Ala Gly Glu Val 
275 


gee 
Ala 


ggc 
Gly 


931 


S hI get 

^ Ala 


gcg 

Ala 


acc 
Thr 
280 


ctg 
Leu 


acc 
Thr 


gtt 

Val 


ggc 

Glv 


gac 

Asp 
285 


acc 
Thr 


ctg age ggc ggt get 
Leu Ser Gly Gly Ala 
290 


ggc 

Gly 


acc 
Thr 


979 


pi* 

gac 

858^' Asp 

m 


gtc 
Val 
295 


ctg 
Le\i 


aac 
Asn 


tgg 


gtg 

Val 


caa 
Gin 
300 


get 
Ala 


get 
Ala 


gcg gtt acg get ctg 
Ala Val Thr Ala Leu 
305 


ccg 
Pro 


acc 
Thr 


1027 


P'l ggc 
? Glv 
310 


gtg 
Val 


acg 
Thr 


ate 
lie 


teg 
Ser 


ggc 
Gly 
315 


ate 
He 


gaa 
Glu 


acg 
Thr 


atg 
Met 


aac gtg acg teg 
Asn Val Thr Ser 
320 


ggc 
Gly 


get 

Ala 
325 


1075 


gcg 
Ala 


ate 
lie 


acc 
Thr 


ctg 
Leu 


aac 
Asn 
330 


acg 
Thr 


tct 
Ser 


teg 
Ser 


ggc 
Gly 


gtg 
Val 
335 


ac g ggt ctg acc 
Thr Gly Leu Thr 


gee 
Ala 
340 


ctg 
Leu 


1123 


aac 
Ash 


acc 
Thr 


aac 
Asn 


acc 
Thr 
345 


age 
Ser 


ggc 

Glv 


gcg 
Ala 


get 

Ala 


caa 
Gin 
350 


acc 
Thr 


gtc acc gee ggc 
Val Thr Ala Gly 
355 


get 
Ala 


ggc 
Gly 


1171 


cag 
Gin 


aac 
Asn 


ctg 
Leu 
360 


acc 
Thr 


gee 
Ala 


acg 
Thr 


acc 
Thr 


gec 

Ala 
365 


get 
Ala 


caa 
Gin 


gee gcg aac aac 
Ala Ala Asn Asn 
370 


gtc 
Val 


gec 
Ala 


1219 


gtc 

Val 


gac 
Asp 
375 


ggg 

Gly 


cgc 
Arg 


gec 
Ala 


aac 
Asn 


gtc 

Val 
380 


acc 
Thr 


gtc 

Val 


gec 
Ala 


teg acg ggc gtg 
Ser Thr Gly Val 
385 


acc 
Thr 


teg 
Ser 


1267 


ggc 


acg 


acc 


acg 


gtc 


ggc 


gee 


aac 


teg 


gee 


get teg ggc acc 


gtg 


teg 


1315 
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Gly Thr Thr Thr Val Gly Ala Asn Ser Ala Ala Ser Gly Thr Val Ser 
390 395 400 405 

gtg age gtc gcg aac teg age acg acc acc acg ggc get ate gee gtg 1363 
Val Ser Val Ala Asn Ser Ser Thr Thr Thr Thr Gly Ala lie Ala Val 
410 415 420 

acc ggt ggt acg gee gtg acc gtg get caa acg gee ggc aac gec gtg 1411 
Thr Gly Gly Thr Ala Val Thr Val Ala Gin Thr Ala Gly Asn Ala Val 
425 430 435 

aac acc acg ttg acg caa gee gac gtg acc gtg acc ggt aac tec age 1459 
Asn Thr Thr Leu Thr Gin Ala Asp Val Thr Val Thr Gly Asn Ser Ser 
440 445 450 

acc acg gee gtg acg gtc acc caa acc gee gee gec acc gee ggc get 1507 
Thr Thr Ala Val Thr Val Thr Gin Thr Ala Ala Ala Thr Ala Gly Ala 
455 460 465 

acg gtc gec ggt cgc gtc aac ggc get gtg acg ate acc gac tct gec 1555 
Thr Val Ala Gly Arg Val Asn Gly Ala Val Thr lie Thr Asp Ser Ala 
Q 470 475 480 485 



S| gec gee teg gee acg acc gee ggc aag ate gec acg gtc acc ctg ggc 
J Ala Ala Ser Ala Thr Thr Ala Gly Lys He Ala Thr Val Thr Leu Gly 
iU 490 495 500 



S3 ctg teg ggc acg ggc acc teg etc ggc ate ggc cgc ggc get ctg acc 
42 Leu Ser Gly Thr Gly Thr Ser Leu Gly He Gly Arg Gly Ala Leu Thr 
Pl.I 520 525 530 



1603 



age ttc ggc gee gee acg ate gac teg age get ctg acg acc gtc aac 1651 
'f? Ser Phe Gly Ala Ala Thr He Asp Ser Ser Ala Leu Thr Thr Val Asn 
^ 505 510 515 



1699 



Ji% gec acg ccg acc gec aac acc ctg acc ctg aac gtc aat ggt ctg acg 1747 
rf Ala Thr Pro Thr Ala Asn Thr Leu Thr Leu Asn Val Asn Gly Leu Thr 
r " ? 535 540 545 

acg acc ggc gcg ate acg gac teg gaa gcg get get gac gat ggt ttc 1795 
Thr Thr Gly Ala He Thr Asp Ser Glu Ala Ala Ala Asp Asp Gly Phe 
550 555 560 565 

acc acc ate aac ate get ggt teg acc gec tct teg acg ate gec age 1843 
Thr Thr He Asn He Ala Gly Ser Thr Ala Ser Ser Thr He Ala Ser 
570 575 580 

ctg gtg gee gee gac gcg acg acc ctg aac ate teg ggc gac get cgc 1891 
Leu Val Ala Ala Asp Ala Thr Thr Leu Asn He Ser Gly Asp Ala Arg 
585 590 595 

gtc acg ate acc teg cac acc get gec gee ctg acg ggc ate acg gtg 1939 
Val Thr He Thr Ser His Thr Ala Ala Ala Leu Thr Gly He Thr Val 
600 605 610 



acc aac age gtt ggt gcg acc etc ggc gee gaa ctg gcg acc ggt ctg 
Thr Asn Ser Val Gly Ala Thr Leu Gly Ala Glu Leu Ala Thr Gly Leu 



1987 



5 



615 620 625 

gtc ttc acg ggc ggc get ggc cgt gac teg ate ctg ctg ggc gec acg 2035 
Val Phe Thr Gly Gly Ala Gly Arg Asp Ser lie Leu Leu Gly Ala Thr 
630 635 640 645 

acc aag gcg ate gtc atg ggc gec ggc gac gac acc gtc acc gtc age 2083 
Thr Lys Ala lie Val Met Gly Ala Gly Asp Asp Thr Val Thr Val Ser 
650 655 660 

teg gcg acc ctg ggc get ggt ggt teg gtc aac ggc ggc gac ggc acc 2131 
Ser Ala Thr Leu Gly Ala Gly Gly Ser Val Asn Gly Gly Asp Gly Thr 
665 670 675 

gac gtt ctg gtg gee aac gtc aac ggt teg teg ttc age get gac ccg 2179 
Asp Val Leu Val Ala Asn Val Asn Gly Ser Ser Phe Ser Ala Asp Pro 
680 685 690 

gee ttc ggc ggc ttc gaa acc etc cgc gtc get ggc gcg gcg get caa 2227 
Ala Phe Gly Gly Phe Glu Thr Leu Arg Val Ala Gly Ala Ala Ala Gin 
695 700 705 

^1 ggc teg cac aac gee aac ggc ttc acg get ctg caa ctg ggc gcg acg 2275 
HI qi y Ser H i s Asn A ia Asn Gly Phe Thr Ala Leu Gin Leu Gly Ala Thr 

H 710 715 720 725 

m 

||| gcg ggt gcg acg acc ttc aec aac gtt gcg gtg aat gtc ggc ctg acc 2323 
\J Ala Gly Ala Thr Thr Phe Thr Asn Val Ala Val Asn Val Gly Leu Thr 
14 730 735 740 

r " gtt ctg gcg get ccg acc ggt acg acg acc gtg acc ctg gee aac gee 2371 
!L Val Leu Ala Ala Pro Thr Gly Thr Thr Thr Val Thr Leu Ala Asn Ala 
W 745 750 755 

fy. acg ggc acc teg gac gtg ttc aac ctg acc ctg teg tec teg gec get 2419 
if! Thr Gly Thr Ser Asp Val Phe Asn Leu Thr Leu Ser Ser Ser Ala Ala 
Fl 760 765 770 

5 ii 

: ' ctg gee get ggt acg gtt gcg ctg get ggc gtc gag acg gtg aac ate 2467 
Leu Ala Ala Gly Thr Val Ala Leu Ala Gly Val Glu Thr Val Asn lie 
775 780 785 

gee gec acc gac acc aac acg acc get cac gtc gac acg ctg acg ctg 2515 
Ala Ala Thr Asp Thr Asn Thr Thr Ala His Val Asp Thr Leu Thr Leu 
790 795 800 805 

caa gec acc teg gee aag teg ate gtg gtg acg ggc aac gee ggt ctg 2563 
Gin Ala Thr Ser Ala Lys Ser lie Val Val Thr Gly Asn Ala Gly Leu 
810 815 820 

aac ctg acc aac acc ggc aac acg get gtc acc age ttc gac gee age 2611 
Asn Leu Thr Asn Thr Gly Asn Thr Ala Val Thr Ser Phe Asp Ala Ser 
825 830 835 

gec gtc acc ggc acg get ccg get gtg acc ttc gtg teg gee aac acc 2659 
Ala Val Thr Gly Thr Ala Pro Ala Val Thr Phe Val Ser Ala Asn Thr 
840 845 850 



acg gtg ggt gaa gtc gtc acg ate cgc ggc ggc get ggc gec gac teg 2707 
Thr Val Gly Glu Val Val Thr lie Arg Gly Gly Ala Gly Ala Asp Ser 
855 860 865 

ctg acc ggt teg gec ace gec aat gac ace ate ate ggt ggc get ggc 2 755 

Leu Thr Gly Ser Ala Thr Ala Asn Asp Thr lie He Gly Gly Ala Gly 
870 875 880 885 

get gac acc ctg gtc tac acc ggc ggt acg gac acc ttc acg ggt ggc 2803 
Ala Asp Thr Leu Val Tyr Thr Gly Gly Thr Asp Thr Phe Thr Gly Gly 
890 895 900 

acg ggc gcg gat ate ttc gat ate aac get ate ggc acc teg acc get 2851 
Thr Gly Ala Asp lie Phe Asp He Asn Ala He Gly Thr Ser Thr Ala 
905 910 915 

ttc gtg acg ate acc gac gee get gtc ggc gac aag etc gac etc gtc 2899 
Phe Val Thr He Thr Asp Ala Ala Val Gly Asp Lys Leu Asp Leu Val 
920 925 930 

^Jf ggc ate teg acg aac ggc get ate get gac ggc gee ttc ggc get gcg 2947 
Gly He Ser Thr Asn Gly Ala He Ala Asp Gly Ala Phe Gly Ala Ala 
935 940 945 

|lj gtc acc ctg ggc get get gcg acc ctg get cag tac ctg gac get get 2995 

*4 Val Tnr Leu G1 Y Ala Ala Ala Thr Leu Ala Gln T Y r Leu Asp Ala Ala 
jU] 950 955 960 965 

1 get gec ggc gac ggc age ggc acc teg gtt gec aag tgg ttc cag ttc 
JL Ala Ala Gly Asp Gly Ser Gly Thr Ser Val Ala Lys Trp Phe Gin Phe 

970 975 980 

Ill ggc ggc gac acc tat gtc gtc gtt gac age teg get ggc gcg acc ttc 3091 
U! Gly Gly Asp Thr Tyr Val Val Val Asp Ser Ser Ala Gly Ala Thr Phe 
Q 985 990 995 

gtc age ggc get gac gcg gtg ate aag ctg acc ggt ctg gtc acg ctg 3139 
Val Ser Gly Ala Asp Ala Val He Lys Leu Thr Gly Leu Val Thr Leu 
1000 1005 1010 

acc acc teg gec ttc gec acc gaa gtc ctg acg etc gee t aagcgaacgt 3189 
Thr Thr Ser Ala Phe Ala Thr Glu Val Leu Thr Leu Ala 
1015 1020 1025 

ctgatcctcg ectaggegag gategctaga ctaagagacc ccgtcttccg aaagggaggc 3249 
ggggtctttc ttatgggcgc tacgegctgg ccggccttgc etagttcegg t 3300 

<210> 5 
<211> 1026 
<212> PRT 

<213> Caulobacter crescentus 
<400> 5 

Met Ala Tyr Thr Thr Ala Gin Leu Val Thr Ala Tyr Thr Asn Ala Asn 

15 10 15 

Leu Gly Lys Ala Pro Asp Ala Ala Thr Thr Leu Thr Leu Asp Ala Tyr 



3043 
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20 25 30 

Ala Thr Gin Thr Gin Thr Gly Gly Leu Ser Asp Ala Ala Ala Leu Thr 

35 40 45 

Asn Thr Leu Lys Leu Val Asn Ser Thr Thr Ala Val Ala He Gin Thr 

50 55 60 

Tyr Gin Phe Phe Thr Gly Val Ala Pro Ser Ala Ala Gly Leu Asp Phe 
65 70 75 80 

Leu Val Asp Ser Thr Thr Asn Thr Asn Asp Leu Asn Asp Ala Tyr Tyr 

85 90 95 

Ser Lys Phe Ala Gin Glu Asn Arg Phe He Asn Phe Ser He Asn Leu 

100 105 110 

Ala Thr Gly Ala Gly Ala Gly Ala Thr Ala Phe Ala Ala Ala Tyr Thr 

115 120 125 

Gly Val Ser Tyr Ala Gin Thr Val Ala Thr Ala Tyr Asp Lys He He 

130 135 140 

Gly Asn Ala Val Ala Thr Ala Ala Gly Val Asp Val Ala Ala Ala Val 
145 150 155 160 

Ala Phe Leu Ser Arg Gin Ala Asn He Asp Tyr Leu Thr Ala Phe Val 

165 170 175 

Arg Ala Asn Thr Pro Phe Thr Ala Ala Ala Asp He Asp Leu Ala Val 
180 185 190 

^ Lys Ala Ala Leu He Gly Thr He Leu Asn Ala Ala Thr Val Ser Gly 
O 195 200 205 

W He Gly Gly Tyr Ala Thr Ala Thr Ala Ala Met He Asn Asp Leu Ser 
\| 210 215 220 

*S Asp Gly Ala Leu Ser Thr Asp Asn Ala Ala Gly Val Asn Leu Phe Thr 
flf 225 230 235 240 

Q Ala Tyr Pro Ser Ser Gly Val Ser Gly Ser Thr Leu Ser Leu Thr Thr 

245 250 255 

■rf Gly Thr Asp Thr Leu Thr Gly Thr Ala Asn Asn Asp Thr Phe Val Ala 
^ 260 265 270 

l„ Gly Glu Val Ala Gly Ala Ala Thr Leu Thr Val Gly Asp Thr Leu Ser 
!| 275 280 285 

£ Gly Gly Ala Gly Thr Asp Val Leu Asn Trp Val Gin Ala Ala Ala Val 
||! 290 295 300 

fit Thr Ala Leu Pro Thr Gly Val Thr He Ser Gly He Glu Thr Met Asn 
S| 305 310 315 320 

1? Val Thr Ser Gly Ala Ala He Thr Leu Asn Thr Ser Ser Gly Val Thr 
P * 325 330 335 

Gly Leu Thr Ala Leu Asn Thr Asn Thr Ser Gly Ala Ala Gin Thr Val 

340 345 350 

Thr Ala Gly Ala Gly Gin Asn Leu Thr Ala Thr Thr Ala Ala Gin Ala 

355 360 365 

Ala Asn Asn Val Ala Val Asp Gly Arg Ala Asn Val Thr Val Ala Ser 

370 375 380 

Thr Gly Val Thr Ser Gly Thr Thr Thr Val Gly Ala Asn Ser Ala Ala 
385 390 395 400 

Ser Gly Thr Val Ser Val Ser Val Ala Asn Ser Ser Thr Thr Thr Thr 

405 410 415 

Gly Ala He Ala Val Thr Gly Gly Thr Ala Val Thr Val Ala Gin Thr 

420 425 430 

Ala Gly Asn Ala Val Asn Thr Thr Leu Thr Gin Ala Asp Val Thr Val 

435 440 445 

Thr Gly Asn Ser Ser Thr Thr Ala Val Thr Val Thr Gin Thr Ala Ala 

450 455 460 

Ala Thr Ala Gly Ala Thr Val Ala Gly Arg Val Asn Gly Ala Val Thr 
465 470 475 480 
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He Thr Asp Ser Ala Ala Ala Ser Ala Thr Thr Ala Gly Lys He Ala 

485 490 495 

Thr Val Thr Leu Gly Ser Phe Gly Ala Ala Thr He Asp Ser Ser Ala 

500 505 510 

Leu Thr Thr Val Asn Leu Ser Gly Thr Gly Thr Ser Leu Gly He Gly 

515 520 525 

Arg Gly Ala Leu Thr Ala Thr Pro Thr Ala Asn Thr Leu Thr Leu Asn 

530 535 540 

Val Asn Gly Leu Thr Thr Thr Gly Ala He Thr Asp Ser Glu Ala Ala 
545 550 555 560 

Ala Asp Asp Gly Phe Thr Thr He Asn He Ala Gly Ser Thr Ala Ser 

565 570 575 

Ser Thr He Ala Ser Leu Val Ala Ala Asp Ala Thr Thr Leu Asn He 

580 585 590 

Ser Gly Asp Ala Arg Val Thr He Thr Ser His Thr Ala Ala Ala Leu 

595 600 605 

Thr Gly He Thr Val Thr Asn Ser Val Gly Ala Thr Leu Gly Ala Glu 

610 615 620 

Leu Ala Thr Gly Leu Val Phe Thr Gly Gly Ala Gly Arg Asp Ser He 
625 630 635 640 

Leu Leu Gly Ala Thr Thr Lys Ala He Val Met Gly Ala Gly Asp Asp 
645 650 655 

S3 Thr Val Thr Val Ser Ser Ala Thr Leu Gly Ala Gly Gly Ser Val Asn 
M3 660 665 670 

Sj Gly Gly Asp Gly Thr Asp Val Leu Val Ala Asn Val Asn Gly Ser Ser 
S 675 680 685 

iiji Phe Ser Ala Asp Pro Ala Phe Gly Gly Phe Glu Thr Leu Arg Val Ala 
Jfl 690 695 700 

/I Gly Ala Ala Ala Gin Gly Ser His Asn Ala Asn Gly Phe Thr Ala Leu 
705 710 715 720 

— Gin Leu Gly Ala Thr Ala Gly Ala Thr Thr Phe Thr Asn Val Ala Val 
% 725 730 735 

B Asn Val Gly Leu Thr Val Leu Ala Ala Pro Thr Gly Thr Thr Thr Val 
45 740 745 750 

f|I Thr Leu Ala Asn Ala Thr Gly Thr Ser Asp Val Phe Asn Leu Thr Leu 
Ip 755 760 765 

S| Ser Ser Ser Ala Ala Leu Ala Ala Gly Thr Val Ala Leu Ala Gly Val 
rf 770 775 780 

m Glu Thr Val Asn He Ala Ala Thr Asp Thr Asn Thr Thr Ala His Val 
785 790 795 800 

Asp Thr Leu Thr Leu Gin Ala Thr Ser Ala Lys Ser He Val Val Thr 

805 810 815 

Gly Asn Ala Gly Leu Asn Leu Thr Asn Thr Gly Asn Thr Ala Val Thr 

820 825 830 

Ser Phe Asp Ala Ser Ala Val Thr Gly Thr Ala Pro Ala Val Thr Phe 

835 840 845 

Val Ser Ala Asn Thr Thr Val Gly Glu Val Val Thr He Arg Gly Gly 

850 855 860 

Ala Gly Ala Asp Ser Leu Thr Gly Ser Ala Thr Ala Asn Asp Thr He 
865 870 875 880 

He Gly Gly Ala Gly Ala Asp Thr Leu Val Tyr Thr Gly Gly Thr Asp 

885 890 895 

Thr Phe Thr Gly Gly Thr Gly Ala Asp He Phe Asp He Asn Ala He 

900 905 910 

Gly Thr Ser Thr Ala Phe Val Thr He Thr Asp Ala Ala Val Gly Asp 

915 920 925 

Lys Leu Asp Leu Val Gly He Ser Thr Asn Gly Ala He Ala Asp Gly 
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930 935 940 

Ala Phe Gly Ala Ala Val Thr Leu Gly Ala Ala Ala Thr Leu Ala Gin 
945 950 955 960 

Tyr Leu Asp Ala Ala Ala Ala Gly Asp Gly Ser Gly Thr Ser Val Ala 

965 970 975 

Lys Trp Phe Gin Phe Gly Gly Asp Thr Tyr Val Val Val Asp Ser Ser 

980 985 990 

Ala Gly Ala Thr Phe Val Ser Gly Ala Asp Ala Val He Lys Leu Thr 

995 1000 1005 

Gly Leu Val Thr Leu Thr Thr Ser Ala Phe Ala Thr Glu Val Leu Thr 

1010 1015 1020 

Leu Ala 
1025 

<210> 6 

<211> 306 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetically generated polynucleotide 

fil <221> CDS 

MS <222> (1) . . . (306) 

aJS <400> 6 

||| gaa ttc aga tct cag ggc gcg ggg cag ggt ggc tat ggt ggg etc ggc 4 8 

3*j Glu Phe Arg Ser Gin Gly Ala Gly Gin Gly Gly Tyr Gly Gly Leu Gly 
M 15 10 15 

^ teg caa ggc get ggc ctg ggt ggc cag ggc get ggc gcg gee gcg gec 96 
S Ser Gin Gly Ala Gly Leu Gly Gly Gin Gly Ala Gly Ala Ala Ala Ala 
0 20 25 30 

fit get gcg gec ggt ggc get ggc cag ggc ggg ctg ggc teg cag ggc gec 144 
Ifl Ala Ala Ala Gly Gly Ala Gly Gin Gly Gly Leu Gly Ser Gin Gly Ala 
X 35 40 45 

r " ggc caa ggc get ggc gee gcg gee get gcg gec ggt ggc gee ggc cag 192 
Gly Gin Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gin 
50 55 60 

ggt ggc tac ggc ggc ctg ggc age cag ggc gec ggt cgc ggc ggt cag 240 
Gly Gly Tyr Gly Gly Leu Gly Ser Gin Gly Ala Gly Arg Gly Gly Gin 
65 70 75 80 

ggc gee ggt gee gcg gec get gcg gec ggt ggc get ggg caa ggc ggc 288 
Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gin Gly Gly 
85 90 95 

tac ggc ggt ctg gga tec 306 
Tyr Gly Gly Leu Gly Ser 
100 



<210> 7 
<211> 102 
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<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Synthetically generated polypeptide 
<400> 7 

Glu Phe Arg Ser Gin Gly Ala Gly Gin Gly Gly Tyr Gly Gly Leu Gly 

15 10 15 

Ser Gin Gly Ala Gly Leu Gly Gly Gin Gly Ala Gly Ala Ala Ala Ala 

20 25 30 

Ala Ala Ala Gly Gly Ala Gly Gin Gly Gly Leu Gly Ser Gin Gly Ala 

35 40 45 

Gly Gin Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gin 

50 55 60 

Gly Gly Tyr Gly Gly Leu Gly Ser Gin Gly Ala Gly Arg Gly Gly Gin 
65 70 75 80 

Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gin Gly Gly 

85 90 95 

Tyr Gly Gly Leu Gly Ser 
100 

13 <210> 8 
y| <211> 780 
\f <212> DNA 

Jj <213> Infectious Pancreatic Necrosis Virus 

IM. 
j <220> 

fjj <221> CDS 

f$ <222> {!)... (780) 



10 15 



$ <400> 8 








O atg 


aac 


aca 


aac 


aag 


£ Met 


Asn 


Thr 


Asn 


Lys 


Hi i 
If! 








5 


:"■« cca 


gag 


act 


gga 


cca 


H Pro 


Glu 


Thr 


Gly 
20 


Pro 


ate 


tta 


aaa 


caa 


gag 


lie 


Leu 


Lys 
35 


Gin 


Glu 


gga 


agt 


ggc 


att 


ctt 


Gly 


Ser 
50 


Gly 


He 


Leu 


ggt 


gca 


cac 


tac 


aga 


Gly 


Ala 


His 


Tyr 


Arg 


65 










cag 


tgg 


ctg 


gag 


acg 


Gin 


Trp 


Leu 


Glu 


Thr 



25 30 



40 45 



55 60 



70 75 80 



85 90 95 



agg ctg ate tea agg aaa tac gac att caa age tec aca eta ccg gee 
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Arg Leu lie Ser Arg Lys Tyr Asp lie Gin Ser Ser Thr Leu Pro Ala 
100 105 110 

ggt etc tat get ctg aac ggg acg etc aac get gec acc ttc gaa ggc 
Gly Leu Tyr Ala Leu Asn Gly Thr Leu Asn Ala Ala Thr Phe Glu Gly 
115 120 125 

agt ctg tct gag gtg gag age ctg acc tac aat age ctg atg tec eta 
Ser Leu Ser Glu Val Glu Ser Leu Thr Tyr Asn Ser Leu Met Ser Leu 
130 135 140 

act acg aac ccc cag gac aaa gee aac aac cag ctg gtg acc aaa gga 
Thr Thr Asn Pro Gin Asp Lys Ala Asn Asn Gin Leu Val Thr Lys Gly 
145 150 155 160 

gtc acc gtc ctg aat eta cca aca ggg ttc gac aaa cca tac gtc cgc 
Val Thr Val Leu Asn Leu Pro Thr Gly Phe Asp Lys Pro Tyr Val Arg 
165 170 175 



\| ctg agg tgc aca get gca att gca cca egg agg tac gag ate gac etc 
jp Leu Arg Cys Thr Ala Ala lie Ala Pro Arg Arg Tyr Glu lie Asp Leu 
*H 195 200 205 



f|| etc tac gag gga aac gee gac ate gtc age tec aca aca gtg acg gga 
4§ Leu Tyr Glu Gly Asn Ala Asp He Val Ser Ser Thr Thr Val Thr Gly 
HI 225 230 235 240 

F % 

g| gac ata aac ttc agt ctg gca gaa cga ccc gca aac gag acc agg ttc 
M Asp He Asn Phe Ser Leu Ala Glu Arg Pro Ala Asn Glu Thr Arg Phe 
m 245 250 255 

gac ttc cag ctg 
Asp Phe Gin Leu 
260 



384 



432 



480 



528 



eta gag gac gag aca ccc cag ggt etc cag tea atg aac ggg gec agg 576 
Leu Glu Asp Glu Thr Pro Gin Gly Leu Gin Ser Met Asn Gly Ala Arg 
180 185 190 



624 



cca tec caa age eta ccc ccc gtt cct gcg aca gga acc etc acc act 672 
Ml p r o ser Gin Ser Leu Pro Pro Val Pro Ala Thr Gly Thr Leu Thr Thr 
M* 210 215 220 



720 



768 



780 



<210> 9 
<211> 260 
<212> PRT 

<213> Infectious Pancreatic Necrosis Virus 
<400> 9 

Met Asn Thr Asn Lys Ala Thr Ala Thr Tyr Leu Lys Ser He Met Leu 

15 10 15 

Pro Glu Thr Gly Pro Ala Ser He Pro Asp Asp He Thr Glu Arg His 

20 25 30 

He Leu Lys Gin Glu Thr Ser Ser Tyr Asn Leu Glu Val Ser Glu Ser 

35 40 45 

Gly Ser Gly He Leu Val Cys Phe Pro Gly Ala Pro Gly Ser Arg He 
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50 55 60 

Gly Ala His Tyr Arg Trp Asn Ala Asn Gin Thr Gly Leu Glu Phe Asp 
65 70 75 80 

Gin Trp Leu Glu Thr Ser Gin Asp Leu Lys Lys Ala Phe Asn Tyr Gly 

85 90 95 

Arg Leu lie Ser Arg Lys Tyr Asp He Gin Ser Ser Thr Leu Pro Ala 

100 105 110 

Gly Leu Tyr Ala Leu Asn Gly Thr Leu Asn Ala Ala Thr Phe Glu Gly 

115 120 125 

Ser Leu Ser Glu Val Glu Ser Leu Thr Tyr Asn Ser Leu Met Ser Leu 

130 135 140 

Thr Thr Asn Pro Gin Asp Lys Ala Asn Asn Gin Leu Val Thr Lys Gly 
145 150 155 160 

Val Thr Val Leu Asn Leu Pro Thr Gly Phe Asp Lys Pro Tyr Val Arg 

165 170 175 

Leu Glu Asp Glu Thr Pro Gin Gly Leu Gin Ser Met Asn Gly Ala Arg 

180 185 190 

Leu Arg Cys Thr Ala Ala He Ala Pro Arg Arg Tyr Glu He Asp Leu 

195 200 205 

Pro Ser Gin Ser Leu Pro Pro Val Pro Ala Thr Gly Thr Leu Thr Thr 
210 215 220 

F%- Leu Tyr Glu Gly Asn Ala Asp He Val Ser Ser Thr Thr Val Thr Gly 
225 230 235 240 

■ft Asp He Asn Phe Ser Leu Ala Glu Arg • Pro Ala Asn Glu Thr Arg Phe 
If 245 250 255 

I* 5 "* Asp Phe Gin Leu 
W 260 

!y <2io> 10 

§»& <211> 131 
I'- <212> PRT 

&n <213> Escherichia coli 

iSSS 

31 <400> 10 

?M Phe Ala Cys Lys Thr Ala Asn Gly Thr Ala He Pro He Gly Gly Gly 
Irt 1 5 10 15 

0 Ser Ala Asn Val Tyr Val Asn Leu Ala Pro Val Val Asn Val Gly Gin 
|4 20 25 30 

Asn Leu Val Val Asp Leu Ser Thr Gin He Phe Cys His Asn Asp Tyr 

35 40 45 

Pro Glu Thr He Thr Asp Tyr Val Thr Leu Gin Arg Gly Ser Ala Ser 

50 55 60 

Tyr Pro Phe Pro Thr Thr Ser Glu Thr Pro Arg Val Val Tyr Asn Ser 
65 70 75 80 

Arg Thr Asp Lys Pro Trp Pro Val Ala Leu Tyr Leu Thr Pro Val Ser 

85 90 95 

Ser Ala Gly Gly Val Ala He Lys Ala Gly Ser Leu He Ala Val Leu 

100 105 110 

He Leu Arg Gin Thr Asn Asn Tyr Asn Ser Asp Asp Phe Gin Cys Asp 

115 120 125 

Val Ser Ala 
130 



<210> 11 
<211> 131 
<212> PRT 

<213> Escherichia coli 
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<400> 11 

Phe Ala Cys Lys Thr Ala Asn Gly Thr Ala He Pro He Gly Gly Gly 

15 10 15 

Ser Ala Asn Val Tyr Val Asn Leu Ala Pro Val Val Asn Val Gly Gin 

20 25 30 

Asn Leu Val Val Asp Leu Ser Thr Gin He Phe Cys His Asn Asp Tyr 

35 40 45 

Pro Glu Thr He Thr Asp Tyr Val Thr Leu Gin Arg Gly Ser Ala Ser 

50 55 60 

Tyr Pro Phe Pro Thr Thr Ser Glu Thr Pro Arg Val Val Tyr Asn Ser 
65 70 75 80 

Arg Thr Asp Lys Pro Trp Pro Val Ala Leu Tyr Leu Thr Pro Val Ser 

85 90 95 

Ser Ala Gly Gly Val Ala He Lys Ala Gly Ser Leu He Ala Val Leu 

100 105 110 

He Leu Arg Gin Thr Asn Asn Tyr Asn Ser Asp Asp Phe Gin Cys Asp 

115 120 125 

Val Ser Ala 
130 



^4 



m 
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Appendix 1 



PCT/CA99/0O637 



GCTATTGTCG ACGTATGACG TTTGCTCTAT AGCCATCGCT GCTCCCATGC GCGCCACTCG €0 

GTCGCAGGGG GTGTGGGATT TTTTTTGGGA GACAATCCTC ATGGCCTATA CGACGGCCCA 12 C 

GTTGGTGACT GCCTACACCA ACGCCAACCT CGGCAAGGCG CCTGACGCCG CCACCACGCT 18 0 

GACGCTCGAC GCGTACGCGA CTCAAACCCA GACGGGCGGC CTCTCGGACG CCGCTGCGCT 24 0 

GACCAACACC CTGAAGCTGG TCAACAGCAC GACGGCTGTT GCCATCCAGA CCTACCAGTT 3 00 

CTTCACCGGC GTTGCCCCGT CGGCCGCTGG TCTGGACTTC CTGGTCGACT CGACCACCAA 360 

CACCAACGAC CTGfcACGACG CGTACTACTC GAAGTTCGCT CAGGAAAACC GCTTCATCAA 4 20 

CTTCTCGATC AACCTGGCCA CGGGCGCCGG CGCCGGCGCG ACGGCTTTCG CCGCCGCCTA 480 

CACGGGCGTT TCGTACGCCC AGACGGTCGC CACCGCCTAT GACAAGATCA TCGGCAACGC 54 0 

CGTCGCGACC GCCGCTGGCG TCGACGTCGC GGCCGCCGTG GCTTTCCTGA GCCGCCAGGC 60C 

CAACATCGAC TACCTGACCG CCTTCGTGCG CGCCAACACG CCGTTCACGG CCGCTGCCGA 66 0 

CATCGATCTG GCCCTCAAGG CCGCCCTGAT CGGCACCATC CTGAACGCCG CCACGGTGTC 72 0 
GGGCATCGGT GGTTACGCGA CCGCCACGGC CGCGATGATC AACGACCTGT CGGACGGCGC 7 80 
CCTGTCGACC GACAACGCGG CTGGCGTGAA CCTGTTCACC GCCTATCCGT CGTCGGGCGT 84 0 
GTCGGGTTCG ACCCTCTCGC TGACCACCGG CACCGACACC CTGACGGGCA CCGCCAACAA 900 
CGACACGTTC GTTGCGGGTG AAGTCGCCGG CGCTGCGACC CTGACCGTTG GCGACACCCT 960 

GAGCGGCGGT GCTGGCACCG ACGTCCTGAA CTGGGTGCAA GCTGCTGCGG TTACGGCTCT 1020 

GCCGACCGGC GTGACGATCT CGGGCATCGA AACGATGAAC GTGACGTCGG GCGCTGCGAT 10 8 C 

CACCCTGAAC ACGTCTTCGG GCGTGACGGG TCTGACCGCC CTGAACACCA ACACCAGCGG 114 0 

CGCGGCTCAA ACCGTCACCG CCGGCGCTGG CCAGAACCTG ACCGCCACGA CCGCCGCTCA 1200 

AGCCGCGAAC AACGTCGCCG TCGACGGGCG CGCCAACGTC ACCGTCGCCT CGACGGGCGT 1260 

GACCTCGGGC ACGACCACGG TCGGCGCCAA CTCGGCCGCT TCGGGCACCG TGTCGGTGAG 1320 

CGTCGCGAAC TCGAGCACGA CCACCACGGG CGCTATCGCC GTGACCGGTG GTACGGCCGT 1380 

GACCGTGGCT CAAACGGCCG GCAACGCCGT GAACACCACG TTGACGCAAG CCGACGTGAC 144 0 

CGTGACCGGT AACTCCAGCA CCACGGCCGT GACGGTCACC CAAACCGCCG CCGCCACGGC 1500 

CGGCGCTACG GTCGCCGGTC GCGTCAACGG CGCTGTGACG ATCACCGACT CTGCCGCCGC 1560 

CTCGGCCACG ACCGCCGGCA AGATCGCCAC GGTCACCCTG GGCAGCTTCG GCGCCGCCAC 1620 

GATCGACTCG AGCGCTCTGA CGACCGTCAA CCTGTCGGGC ACGGGCACCT CGCTCGGCAT 1680 
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CGGCCGCGGC GCTCTGACCG CCACGCCGAC CGCCAACACC CTGACCCTGA ACGTCAATGG 1740 

TCTGACGACG ACCGGCGCGA TCACGGACTC GGAAGCGGCT GCTGACGATG GTTTCACCAC 1800 

CATCAACATC GCTGGTTCGA CCGCCTCTTC GACGATCGCC AGCCTGGTGG CCGCCGACGC I860 

GACGACCCTG AACATCTCGG GCGACGCTCG CGTCACGATC ACCTCGCACA CCGCTGCCGC 192 0 

CCTGACGGGC ATCACGGTGA CCAACAGCGT TGGTGCGACC CTCGGCGCCG AACTGGCGAC 1980 

CGGTCTGGTC TTCACGGGCG GCGCTGGCCG TGACTCGATC CTGCTGGGCG CCACGACCAA 2 04 0 

GGCGATCGTC ATGGGCGCCG GCGACGACAC CGTCACCGTC AGCTCGGCGA CCCTGGGCGC 2100 

TGGTGGTTCG GTCAACGGCG GCGACGGCAC CGACGTTCTG GTGGCCAACG TCAACGGTTC 2160 

GTCGTTCAGC GCTGACCCGG CCTTCGGCGG CTTCGAAACC CTCCGCGTCG CTGGCGCGGC 2220 

GGCTCAAGGC TCGCACAACG CCAACGGCTT CACGGCTCTG CAACTGGGCG CGACGGCGGG 2280 

TGCGACGACC TTCACCAACG TTGCGGTGAA TGTCGGCCTG ACCGTTCTGG CGGCTCCGAC 234 0 

CGGTACGACG ACCGTGACCC TGGCCAACGC CACGGGCACC TCGGACGTGT TCAACCTGAC 24 00 

CCTGTCGTCC TCGQCCGCTC TGGCCGCTGG TACGGTTGCG CTGGCTGGCG TCGAGACGGT 24 60 

GAACATCGCC GCCACCGACA CCAACACGAC CGCTCACGTC GACACGCTGA CGCTGCAAGC 2 520 

CACCTCGGCC AAG7CGATCG TGGTGACGGG CAACGCCGGT CTGAACCTGA CCAACACCGG 2 5 80 

CAACACGGCT GTCACCAGCT TCGACGCCAG CGCCGTCACC GGCACGGCTC CGGCTGTGAC 264 0 

CTTCGTGTCG GCCAACACCA CGGTGGGTGA AGTCGTCACG ATCCGCGGCG GCGCTGGCGC 270 0 

CGACTCGCTG ACCGGTTCGG CCACCGCCAA T GAG AC CATC ATCGGTGGCG CTGGCGCTGA 2 7 60 

CACCCTGGTC TACACCGGCG GTACGGACAC CTTCACGGGT GGCACGGGCG CGGATATCTT 2820 

CGATATCAAC GCTATCGGCA CCTCGACCGC TTTCGTGACG ATCACCGACG CCGCTGTCGG 2 880 

CGACAAGCTC GACCTCGTCG GCATCTCGAC GAACGGCGCT ATCGCTGACG GCGCCTTCGG 294 0 

CGCTGCGGTC ACCCTGGGCG CTGCTGCGAC CCTGGCTCAG TACCTGGACG CTGCTGCTGC 3000 

CGGCGACGGC AGCGGCACCT CGGTTGCCAA GTGGTTCCAG TTCGGCGGCG ACACCTATGT 3060 

CGTCGTTGAC AGCTCGGCTG GCGCGACCTT CGTCAGCGGC GCTGACGCGG TGATCAAGCT 3120 

GACCGGTCTG GTCACGCTGA CCACCTCGGC CTTCGCCACC GAAGTCCTGA CGCTCGCCTA 3180 

AGCGAACGTC TGATCCTCGC CTAGGCGAGG ATCGCTAGAC TAAGAGACCC CGTCTTCCGA 3240 

AAGGGAGGCG GGGTC7TTCT TATGGGCGCT ACGCGCTGGC CGGCCTTGCC TAGTTCCGGT 3300 
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Met Ala Tyr Thr Thr Ala Gin Leu Val Thr Ala Tyr Thr Asn Ala Asn 
15 10 15 

Leu Gly Lys Ala Pro Asp Ala Ala Thr Thr Leu Thr Leu Asp Ala Tyr 

20 25 30 

Ala Thr Gin Thr Gin Thr Gly Gly Leu Ser Asp Ala Ala Ala Leu Thr 
35 40 45 

Asn Thr Leu Lys Leu Val Asn Ser Thr Thr Ala Val Ala He Gin Thr 
50 55 60 

Tyr Gin Phe Phe Thr Gly Val Ala Pro Ser Ala Ala Gly Leu Asp Phe 
65 70 75 80 

Leu Val Asc Ser Thr Thr Asn Thr Asn Asp Leu Asn Asp Ala Tyr Tyr 
85 90 95 

Ser Lys Phe Ala Gin Glu Asn Arg Phe He Asn Phe Ser He Asn Leu 
100 105 110 

Ala Thr Gly Ala Gly Ala Gly Ala Thr Ala Phe Ala Ala Ala Tyr Thr 
115 12C 125 

Glv Val Ser Tyr Ala Gin Thr Val Ala Thr Ala Tyr Asp Lys He He 
130 135 140 

Gly Asn Ala Val Ala Thr Ala Ala Gly Val Asp Val Ala Ala Ala Val 
145 150 155 160 

Ala Phe Leu Ser Arg Gin Ala Asn lie Asp Tyr Leu Thr Ala Phe Val 
165 170 175 

Arg Ala Asn Thr Pro Phe Thr Ala Ala Ala Asp He Asp Leu Ala Val 
180 185 190 

Lys Ala Ala Leu He Gly Thr lie Leu Asn Ala Ala Thr Val Ser Gly 
195 20C 205 

He Gly Gly Tyr Ala Thr Ala Thr Ala Ala Met He Asn Asp Leu Ser 
210 215 220 

Asp Gly Ala Leu Ser Thr Asp Asn Ala Ala Gly Val Asn Leu Phe Thr 
225 230 235 240 

Ala Tyr Pre Ser Ser Gly Val Ser Gly Ser Thr Leu Ser Leu Thr Thr 
245 250 25S 

Gly Thr Asp Thr Leu Thr Gly Thr Ala Asn Asn Asp Thr Phe Val Ala 
260 265 270 

Gly Glu Val Ala Gly Ala Ala Thr Leu Thr Val Gly Asp Thr Leu Ser 
275 280 285 

Gly Gly Ala Gly Thr Asp Val Leu Asn Trp Val Gin Ala Ala Ala Val 
290 295 300 

Thr Ala Leu Pro Thr Gly Val Thr lie Ser Gly lie Glu Thr Met Asn 
305 310 315 320 

Val Thr Ser Gly Ala Ala He Thr Leu Asn Thr Ser Ser Gly Val Thr 
325 330 335 

Gly Leu Thr Ala Leu Asn Thr Asn Thr Ser Gly Ala Ala Gin Thr Val 
340 345 350 



WO 00/04170 



22 

Appendix 1 (cont'd) 



PCT/CA99/00637 



Thr Ala Gly Ala Gly Gin Asn Leu Thr Ala Thr Thr Ala Ala Gin Ala 
355 360 365 

Ala Asn Asn Val Ala Val Asp Gly Arg Ala Asn Val Thr Val Ala Ser 
370 375 380 

Thr Gly Val Thr Ser Gly Thr Thr Thr Val Gly Ala Asn Ser Ala Ala 
38S 390 395 400 

Ser Gly Thr Val Ser Val Ser Val Ala Asn Ser Ser Thr Thr Thr Thr 
405 410 415 

Gly Ala lie Ala Val Thr Gly Gly Thr Ala Val Thr Val Ala Gin Thr 
420 425 430 

Ala Gly Asn Ala Val Asn Thr Thr Leu Thr Gin Ala Asp Val Thr Val 
435 440 445 

Thr Gly Asn Ser Ser Thr Thr Ala Val Thr Val Thr Gin Thr Ala Ala 
450 455 460 

Ala Thr Ala Gly Ala Thr Val Ala Gly Arg Val Asn Gly Ala Val Thr 
465 47C 475 480 

He Thr Asp Ser Ala Ala Ala Ser Ala Thr Thr Ala Gly Lys He Ala 
485 490 495 

Tnr Val Thr Leu Gly Ser Phe Gly Ala Ala Thr He Aso Ser Ser Ala 
500 505 510 

Leu Thr Thr Val Asn Leu Ser Gly Thr Glv Thr Ser Leu Glv lie Giy 
515 520 * 525 

Arg Gly Ala Leu Thr Ala Thr Pro Thr Ala Asn Thr Leu Thr Leu Asn 
530 535 540 

Val Asn Gly Leu Thr Thr Thr Gly Ala lie Thr Asp Ser Glu Ala Ala 
545 550 555 560 



Ala Asp Asp Gly Phe Thr Thr He Asn He Ala Gly Ser Thr Ala Ser 
565 570 575 

Ser Thr He Ala Ser Leu Val Ala Ala Asp Ala Thr Thr Leu Asn He 
580 585 590 

Ser Gly Asp Ala Arg Val Thr He Thr Ser His Thr Ala Ala Ala Leu 
595 600 605 

Thr Gly He Thr Val Thr Asn Ser Val Gly Ala Thr Leu Gly Ala Glu 
610 615 620 

Leu Ala Thr Gly Leu Val Phe Thr Giy Gly Ala Gly Arg Asp Ser He 
625 630 635 ' 640 

Leu Leu Gly Ala Thr Thr Lys Ala He Val Met Gly Ala Gly Asp Asp 
645 650 655 

Thr Val Thr Val Ser Ser Ala Thr Leu Gly Ala Gly Gly Ser Val Asn 
660 665 670 

Gly Gly Asp Gly Thr Asp Val Leu Val Ala Asn Val Asn Gly Ser Ser 
675 680 685 

Phe Ser Ala Asp Pro Ala Phe Gly Gly Phe Glu Thr Leu Arg Val Ala 
690 695 700 
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Glv Ala Ala Ala Gin Gly Ser His Asn Ala Asn Gly Phe Thr Ala Leu 

705 710 715 720 

Gin Leu Gly Ala Thr Ala Gly Ala Thr Thr Phe Thr Asn Val Ala Val 
725 730 735 

Asn Val Gly Leu Thr Val Leu Ala Ala Pro Thr Gly Thr Thr Thr Val 
740 745 750 

Thr Leu Ala Asn Ala Thr Gly Thr Ser Asp Val Phe Asn Leu Thr Leu 
755 760 765 

Ser Ser Ser Ala Ala Leu Ala Ala Gly Thr Val Ala Leu Ala Gly Val 
770 775 780 

Glu Thr Val Asn He Ala Ala Thr Asp Thr Asn Thr Thr Ala His Val 
785 790 795 800 

Asp Thr Leu Thr Leu Gin Ala Thr Ser Ala Lys Ser He Val Val Thr 
805 810 815 

Glv Asn Ala Gly Leu Asn Leu Thr Asn Thr Gly Asn Thr Ala Val Thr 
820 825 830 

Ser Phe Asp Ala Ser Ala Val Thr Gly Thr Ala Pro Ala Val Thr Phe 
835 840 845 

Val Ser Ala Asn Thr Thr Val Gly Glu Val Val Thr He Arg Gly Gly 
850 855 860 

Ala Gly Ala Asp Ser Leu Thr Gly Ser Ala Thr Ala Asn Asp Thr He 
865 870 875 880 

Tie Gly Glv Ala Gly Ala Asp Thr Leu Val Tyr Thr Gly Gly Thr Asp 
885 890 895 

Tb*- Phe Thr Gly Gly Thr Gly Ala Asp He Phe Asp He Asn Ala He 
900 505 510 

Glv Thr Ser Thr Ala Phe Val Thr He Thr Asp Ala Ala Val Gly Asp 
Y 915 920 925 

Lys Leu Asp Leu Val Gly He Ser Thr Asn Gly Ala He Ala Asp Gly 
930 935 940 

Ala Phe Gly Ala Ala Val Thr Leu Gly Ala Ala Ala Thr Leu Ala Gin 
945 $50 955 960 

Tyr Leu Aso Ala Ala Ala Ala Gly Asp Gly Ser Gly Thr Ser Val Ala 
965 970 975 

Lys Trp Phe Gin Phe Gly Gly Asp Thr Tyr Val Val Val Asp Ser Ser 
980 985 990 

Ala Gly Ala Thr Phe Val Ser Gly Ala Asp Ala Val He Lys Leu Thr 
995 1000 1005 

Gly Leu Val Thr Leu Thr Thr Ser Ala Phe Ala Thr Glu Val Leu Thr 
1010 1015 1020 

Leu Ala 
1025 
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GAA TTC AGA TCT CAG GGC GCG GGG CAG GGT GGC TAT GGT GGG CTC GGC 

TCG CAA GGC 

GCT 

EFRSQGAGQGGYGGLGSQGA 

GGC CTG GGT GGC CAG GGC GCT GGC GCG GCC GCG GCC GCT GCG GCC GGT 

GGC ^ ^ 

GRGGQGAGAAAAAAAGG 

GCT GGC CAG GGC GGG CTG GGC TCG CAG GGC GCC GGC CAA GGC GCT GGC 

GCC GCG GCC 

GCT 

AGQGGLGSQGAGQGAGAAAA 

GCG GCC GGT GGC GCC GGC CAG GGT GGC TAC GGC GGC CTG GGC AGC CAG 

GGC GCC GGT 

CGC 

AAGGAGQGGYGGLGSQGAGR 

GGC GGT CAG GGC GCC GGT GCC GCG GCC GCT GCG GCC GGT GGC GCT GGG 
CAA GGC GGC TAC 

GGQGAGAAAAAAGGAGQGGY 

GGC GGT CTG GGA TCC 
G G L G S 
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atg aac aca aac aag gca acc gca act tac ttg aaa tec att atg ctt cca gac act 

Met asn thr asn lys ata thr aia thr tyr leu iys ser ile met leu pro giu thr 

giy 

61/21 

cca gca age ate ccg gac gac ata acg gag aga cac ate tta aaa caa gag acc teg 
tea 

pro ata ser tie pro asp asp ile thr giu arg his tie ieu iys gin gJu thr se' 

ser 

121/41 

tac aac tta gag gtc tec gaa tea gga agt ggc att ctt gtt tgt ttc cct ggg gca 
cca 

tyr asn leu giu val ser giu ser gty ser giy ile leu val cys phe pro giy aia 

pro 

181/61 

ggc tea egg ate ggt gca cac tac aga tgg aat grg aac cag acg ggg ctg zaq nc 
gac 

giy ser arg He giy ata his tyr arg trp asn aia asn gin thr giy teu giu phe 

asp 

241/81 

cag tgg ctg gag acg teg cag gac ctg aag aaa gec ttc aac tac ggg agg oa ate 
tea 

gin trp leu g!u thr ser gin asp leu lys iys aia phe asn tyr gty arg leu ie 
ser 

301/101 

agg aaa tac gac att caa age tec aca eta ccg gee ggt etc tat get ctg aac rcg 
acg 

arg lys tyr asp ile gin ser ser thr leu pro aia giy leu tyr aia leu asn gf> 
thr 

361/121 

etc aac get gee acc ttc gaa ggc agt ctg tct gag gtg gag age ctg acc tac aat 
age 

leu asn aia aia thr one giu gty ser leu ser giu val giu ser teu thr tvr asr 
ser 

421/141 

ctg atg tec eta act acg aac ecc cag gac aaa gec aac aac cag ctg c^: arc aaa 
993 

leu met ser leu thr thr asn pro gin asp lys ata asn asn gm ieu *a< thr 
giy 

481/161 

gtc acc gtc ctg aat eta cca aca ggg ttc gac aaa cca tac gtc ccc zia >ssz zac 
gag 

val thr val leu asn leu pro thr giy phe asp iys pro tyr va* arg leu giu asc 
giu 

541/181 

aca ccc cag ggt etc cag tea atg aac ggg gec agg atg agg tgc aca gc: rca at! 
gca 

thr pro gin giy leu gin ser met asn giy aia arg met arg cys thr aia aaa ie 
aia 

601/201 

cca egg agg tac gag ate gac etc cca tec caa age eta ccc ccc gtt cc: zcz aca 
9$a 

pro arg arg tyr giu ile asp ieu pro ser gin ser leu pro pro val pro aia rr 
giy 

661/221 

acc etc acc act etc tac gag gga aac gec gac ate gtc age tec aca aca rrc acc 
gga ^ °^ 

thr leu thr thr leu tyr giu giy asn aia asp He val ser ser thr thr val thr 
giy 

721/241 

gac ata aac ttc agt ctg gca gaa cga ccc gca aac gag acc agg ttc gac ic cac 
ctg 3 
asp tie asn phe ser ieu aia giu arg pro aia asn giu thr arg phe asp pre gin 
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The T3 protein sequence is: 

FACKTANGTAIPIGGGSANVYVNLAPWNVGQNLWDLSTQIFCHNDYPETITDYVTLQRGSA 
SYPFPTTSETPRWYNSRTDKPWPVALYLTPVSSAGGVAIKAGSLiAVULRQTNNYNSDOFQ 
CDVSA 

The T7 protein sequence is: 

FACKTANGTAJPIGGGSANVYVNLAPWNVGQNLWDLSTQIFCHNDYPETITDYVTLQRGSA 
SYPFPTTSETPRWYNSRTDKPWPVALYLTPVSSAGGVAIKAGSLIAVLILRQTNNYNSDDFQ 
CDVSARDVTVTLPDYRGSVPIPLTVYCAKSQNLGYYLSGTHADAGNSIFTNTASFSPAQGVG 
GAVGTSAVSLGLTANYARTGGQVTAGNVQSIIGVTFVYQ 
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WHAT IS CLAIMED IS: 

1 . A method of cleaving a fusion protein including a first component which comprises all 
or part of a Caulobacter S-layer protein including a Caulobacter C-tenninal secretion 
signal, and a second component heterologous to Caulobacter, the fusion protein 
containing at least one aspartate-proline dipeptide, wherein the method comprises 
combining the fusion protein with an acid solution of a strength insufficient to 
solubilize the fusion protein for a time sufficient for cleavage of the fusion protein at 
said aspartate-proline dipeptide. 

2. The method of claim I wherein a aspartate-proline dipeptide is situated between the 
first and second components or adjacent a junction between the first and second 
components. 

3. The method of claim 1 or 2, wherein the acid solution has a pH of from about L5 to 
about 2.5. 

4. The method of claim 1 or 2, wherein the acid solution has a pH of about 1 .65 to about 

2.35. 

5. The method of any one of claims 1-4 wherein the method is carried out at a 
temperature in the range of about 30° C. to about 50° C. 

6. The method of any one of claims 1-5, wherein the method further comprises 
separating products cleaved from the fusion protein. 

7 . A method of preparing a DN A construct for expression of a fusion protein suitable for 
use in the method of claim 1 , wherein the method comprises joining an upstream 
DNA segment including DNA heterologous to Caulobacter which encodes a protein 
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of interest, to a downstream DNA segment including DNA for a Caulobacter C- 
terminal secretion signal, wherein the downstream DNA segment does not encode an 
aspartate-proline dipeptide. and wherein the upstream segment contains DNA 
encoding an aspartate-proline dipeptide at or near an end of said upstream segment to 
5 be joined to said downstream segment. 

8. A method of preparing a fusion protein, comprising: 

(1) expressing a DNA construct prepared as described in claim 7 in 

Caulobacter and, 

10 

(2) recovering said fiision protein secreted by the Caulobacter. 
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